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We investigate the performance of robust estimates of multivari- 
ate location under nonstandard data contamination models such as 
componentwise outliers (i.e., contamination in each variable is inde- 
pendent from the other variables). This model brings up a possible 
new source of statistical error that we call "propagation of outliers." 
This source of error is unusual in the sense that it is generated by 
the data processing itself and takes place after the data has been 
collected. We define and derive the influence function of robust mul- 
tivariate location estimates under flexible contamination models and 
use it to investigate the effect of propagation of outliers. Furthermore, 
we show that standard high-breakdown afflne equivariant estimators 
propagate outliers and therefore show poor breakdown behavior un- 
der componentwise contamination when the dimension d is high. 

1. Introduction. Most statistical methods are built in the context of a 
given model and therefore are designed to perform well (e.g., be optimal) for 
this model. Models are also natural "testing grounds" for statistical proce- 
dures and therefore have a profound influence in the way data are processed 
and analyzed. 

Classical models assume data are affected by "normal" noise: small-scale 
fluctuations arising from measurement errors, item-to- item differences and 
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other sources of "well behaved" randomness, for example, Gaussian random 
variables. Gamma random variables, Poisson processes and other "nice" 
random disturbances. Contamination models, on the other hand, assume 
the data may also be affected by abnormal noise: large-scale fluctuations 
that arise from data contamination, uneven data quality, mixed popula- 
tions, gross errors, etc. Several contamination models have been proposed in 
the statistical literature. A nice discussion can be found in Barnet and Lewis 
(1994). 

The best known and most important contamination model is the Tukey- 
Huber model [Tukey (1962) and Huber (1964)]. This model assumes that, 
on average, a large fraction (1 — e) of the data is generated from a classical, 
normal-error-only model. The remaining data, however, can be affected by 
abnormal noise. In other words, the Tukey-Huber model assumes a mixture 
distribution with a fully described dominant component and an unspecified 
minority component. The mixture fraction e is a loosely specified nuisance 
parameter (e.g., < e < 0.25). The goal of a robust statistical analysis is to 
conduct inference on the dominant part of the mixture, filtering out possi- 
ble abnormal noise generated by the minority component. The Tukey-Huber 
contamination model had a profound influence in the general strategy un- 
derlying most robust statistical procedures: identify outlying cases — those 
coming from the minority mixture component — and downweight their influ- 
ence. This model also inspired the definition of key robustness concepts such 
as influence function, gross-error-sensitivity, maxbias and breakdown point. 

The Tukey-Huber contamination model 

X = {1- B)Y + BZ, 

was first introduced in the univariate location-scale setup. The unobserv- 
able variables Y, Z and B are independent, y ~ F [a well-behaved location- 
scale distribution such as A^(//,cr^)], Z G (an unspecified outlier gener- 
ating distribution) and B ~ Binomial(l, e) (a random contamination indi- 
cator). Consequently, the observed variable X has the mixture distribution 
(1 — e)F + eG. The model was later extended and used in other settings in- 
cluding regression and multivariate location-scatter models. See, for exam- 
ple, Martin, Yohai and Zamar (1989) and He, Simpson and Portnoy (1990). 

The rest of the paper is organized as follows. In Section 2 we introduce a 
family of contamination models that includes the Tukey-Huber and compo- 
nentwise contamination models as particular cases. In Section 3 we define 
and derive the influence function of robust multivariate location estimates 
under nonstandard contamination models. In Section 4 we discuss propa- 
gation of outliers and show that standard high breakdown point (BP) ro- 
bust estimates propagate outliers. In Section 5 we investigate the breakdown 
properties under componentwise contamination of robust estimates of multi- 
variate location. Section 6 contains some concluding remarks. Some technical 
derivations and proofs are given in the Appendix. 
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2. Alternative contamination frameworks. The multivariate Tukey-Huber 
model, where X,Y,Z are d-dimensional vectors, may be appropriate for 
smah dimensions but has serious limitations in higher dimensions. A main 
criticism concerns the assumption that the majority of the cases is free of 
contamination. Another criticism concerns the downweighting of contami- 
nated cases. When d is large, the fraction of perfectly observed cases can be 
rather small and the downweighting of an entire case may be inconvenient 
in the case of "fat and short" data tables where the number of variables 
(columns) is much larger than the number of cases (rows). 

We wish to investigate the robustness properties of classical robust esti- 
mates of multivariate location under different contamination models. Sup- 
pose that the random vector Y has density 



and we are interested in estimating the multivariate location vector fiQ. 
However, we cannot observe Y directly. Instead, we observe the random 
vector 



where B = diag(i?i, i?2> • • • i ^d) is a diagonal matrix, Bi,B2, ■ ■ ■ ,3^ are 
Bernoulli random variables with P{Bi = 1) = ej and the vector Z has an 
arbitrary and unspecified outlier generating distribution. 

Note that in principle, the contamination indicator matrix B in model (2) 
could depend on the vector of uncontaminated observations Y. Likewise, the 
contamination vector Z could depend on both the contamination indicator 
matrix B, and the uncontaminated vector Y. In this paper, however, we 
restrict attention to the simpler case where Y, B and Z are independent. 

Different assumptions regarding the joint distribution of Bi, B2, ■ ■ ■ , 
give rise to different contamination models. For example, if Bi, B2, ■ ■ ■ , B^ 
are fully dependent, that is, P{Bi = B2 = ■ ■ ■ = B^) = 1, then model (2) 
reduces to the classical fully dependent contamination model (FDCM) which 
underlies most of the existing robustness theory. An important feature of this 
model is that the probability of an observation being noncontaminated is 1 — 
e and so, independently from the dimension, the majority of the cases — rows 
in the data table — are perfectly observed. Another important feature of this 
model is that the percentage of contaminated cases is preserved under affine 
equivariant transformations. Therefore, it is natural that methods designed 
to perform well under FDCM are affine equivariant and check for the possible 
existence of a minority of contaminated cases to downweight their influence. 
Downweighting the influence of suspicious cases is a good strategy when d 
is relatively small, but becomes less attractive when d is large. For example, 
downweighting an entire case may be unacceptably wasteful if d is very large 
and n is relatively small. 







4 



F. ALQALLAF ET AL. 



Another interesting case is the fully independent contamination model 
[FICM) where Bi,B2, ■ ■ ■ ,Bci are independent. Consider the case P{Bi = 
!) = ••• = P{Bii = 1) = e, then the probabihty that a case is perfectly ob- 
served under this model is (1 — e)"^. Clearly, this probability quickly decreases 
and goes below the critical value 1/2 as d increases [d > 14 when e = 0.05 and 
d > 69 when e = 0.01). Another feature of FICM is its lack of affine equivari- 
ance. While each column in the data table has on average (1 — e)100% clean 
data values, linear combinations of these columns may have a much lower 
percentage of clean data values. A relevant consequence of this is that in 
FICM there is a potential to propagate outliers when performing linear op- 
erations on the original data. Outlier propagation will be discussed further 
in Section 4. Intermediate contamination models that fall between FDCM 
and FICM are briefly discussed in Section 6. 

3. The influence function. The influence function (IF) is a key robust- 
ness tool. It reveals how an estimating functional changes due to an infinites- 
imal amount of contamination [see Hampel et al. (1986)]. The IF of robust 
multivariate location estimates has only been defined under the classical 
FDCM. We wish to extend the definition so that it can be derived under 
other contamination models. 

To fix ideas, we consider the class of M-estimates of multivariate location 
[see, e.g., Tatsuoka and Tyler (2000)] defined as 

(3) ^x{H) = ^IgmlnEH{p[d\y.,Tn,T.{H))]}, 

m 

where 

d2(X, m, S) = (X - m)'S^nX - m) 

and S(i/) is a Fisher consistent, preliminary or simultaneous, estimating 
functional of multivariate scatter. Lemma 3 of Alqallaf et al. (2006) shows 
that when X has an elliptical distribution, then ^{H) is Fisher consistent 
under mild regularity conditions. Moreover, it is easy to show that ^{H) 
satisfies the first order condition: 

(4) EH{V[d'(X,/i(i^), S(if))](X - ^Ji{H))} = 0, 

where ij: = p' . Note that equation (4) is satisfied by large classes of esti- 
mators for multivariate location such as M-estimators [Maronna (1976)], S- 
estimators [Davies (1987), Lopuhaa (1989)], CM-estimators [Kent and Tyler 
(1996)], MM-estimators [Tatsuoka and Tyler (2000), Tyler (2002)] and r- 
estimators [Lopuhaa (1991)]. 

In order to extend the definition of influence function we must first extend 
the notion of "point-mass contaminated distribution." Let be the joint 
distribution of (i?i, . . . , B^), let z = (zi, . . . , Zd) be a given fixed vector in R'^ 
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and let Hq be the distribution with density given by (1). Call H{e,z) the 
distribution of 

X = (I-B)Y + Bz, 

where diag(B) = {Bi, . . . , Bj,) ~ and Y ~ Hq are independent. 

The role played by "point-mass contaminated distributions" in FDCM 
will be played by H(€,z) in our more general setup. 

The influence function IF(/i,z) of the estimating functional fJ-{H) given 
by (3) will be defined and derived for contamination configuration dis- 
tributions Ge satisfying (1) P{Bi = 1) = e (i = 1, . . . ,d); and (2) for any 
sequence {ji,j2,---,jd) of zeros and ones with d — k zeros and k ones: 
P{Bi = ji, . . . , Bd = jd) = These assumptions are clearly satisfied in 

the case of FDCM and FICM. In FDCM we have <5o(e) = (1 - e), 5i{e) = 
■■■ = (5fc„i(e) = 0, and 6k{e) = e. In FICM we have 

5fc(e) = (f)(l-e)'-'e', k = 0,l,...,d. 
The (generalized) infiuence function IF(;U, z) is defined as 

(5) IF(/i,z) = ^M(i^(e,z))[^^. 

Observe that H{e,z) and IF(;U,z) also depend on and Hq but, for sim- 
plicity, this dependence is not reflected in our notation. 
In order to derive IF(/u,z), let 

(6) g{H, m, S) = Enmd^X, m, S))(X - m)}. 
From (4) we have that 

g{H{e,z),fi{H{e,z)),J:{H{e,z))) =0 

or, equivalently, 

6o{e)g{Ho,fi{H{e,z)),J:{H{e,z))) 

+ E ^k{e) E 9{H{I, z),fi{H{e, z)), S(i/(e, z))) = 0, 
fc=i /eXfc 

where = {/ = {ii, . . . , i^} : ii < • • • < 1 < A; < d} and where H{I, z) is 
the distribution function of X = (Xi , . . . , Xd) where Xi = Zi if i G / and 
Xi = Yi \i i ^ I. In particular, Zd = {1, 2, . . . , d} and 2, . . . , d}, z) = 

a point mass distribution at z. 

The influence function (5) will now be obtained by differentiating (7) at 
e = 0. In order to do so we must assume that is Fisher consistent at the 

core model Hq [so S(ff(0, z)) = Sq] and that Y^(H{€, z)) is differentiable with 
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respect to e at e = 0. When performing the differentiation it is important to 
notice that 

(8) g{Ho, fi{H{e, z)), S(i?(e, z)))|,=o = 9(^0, f^o, So) = 0. 
Moreover, in the Appendix we show that when Hq is elhptically symmetric, 

(9) ^g{Ho,fi{H{e,z)),J:{H{e,z))) = -A^lF{fi,z), 

oe <:=0 

where is a constant that does not depend on fiQ and Sq. 

Under FDCM, we have 6i{e) = (5-(e) = for z = 1, . . . , d - 1, 5d{e) = e and 
(5^(0) = 1. So, using (7), (8) and (9) we obtain 

§-^9[H{e, z), /i(if(e, z)), ^(^^(e, z))] = -A^ IF(^, z) + g{A,,fio, Sq) = 0, 
where Az is a point-mass distribution at z. Therefore, 

(10) IF(^, z) = ^5(Az, ^0, So) = -'-V'(d^(z, fJ-o, So))(z - /iq). 

Under FICM we have that 5o{0) = 6[{0) = 1, 5i{0) = 0, and 5^(0) = 5^(0) = 
(for i > 2). So, again using (7), (8) and (9) we have 



^^g[H{e,z),fi{H{e,z)),J:{H{e,z))] 



e=0 



d 

= -A^ lF{n, z) + Y, giH {Ik, z),fio, So) = 0, 

k=l 

where Ik = {k}. Therefore, 

1 

(11) IF(/i,z) = — ^<7(F(/fc,z),/io,So). 

k=i 

Remark 1. It is worth noticing that under all the considered mod- 
els the corresponding influence functions can be interpreted as directional 
(Gateaux) derivatives. It is well known that in the FDCM case the deriva- 
tive is in the direction of Az, a point-mass distribution at z. In the case of 
FICM the derivative is in the direction of 

^E^(4,z), 

k=l 



where H[Ik,z) is the distribution of the random vector Y ~ Hq with its A;th 
component replaced by the constant Zk- 



PROPAGATION OF OUTLIERS IN MULTIVARIATE DATA 



7 



Illustrations. The effect of an infinitesimal amount of contamination on 
an estimating functional critically depends on the type of contamination. In 
the following examples we illustrate some of these differences. 

Figure 1 compares the influence functions of an M-estimator functional 
under FDCM and FICM. We consider the case where Y is bivariate normal 
with mean zero, variances 1 and correlation r. We use the M-estimator based 
on Tukey's bisquare loss function Pc{t) = min(3t^/c^ — 3t^ /c'^ +t^ /c^ , 1) with 
(p = Q, From Figure 1 we see that the influence functions are fully redescend- 
ing for FDCM [panel (a)], as is well known. However, for FICM the influence 
functions are not redescending [panels (b) and (c)]. Therefore, a vanishingly 
small fraction of large coordinatewise contamination may have a persistent 
influence on the location M-estimate. 

Since FICM is not affine equivariant, as discussed further in the next 
sections, the influence function changes with the amount of correlation r. 
Contrary to the classical contamination case, this change is not just a linear 

1/2 

transformation by Sq . This is illustrated in panel (b) (r = 0) and panel 
(c) (r = 0.9) of Figure 1. Note that if r = 0, then the components are almost 
solely influenced by contamination in the corresponding component, while 
in the correlated case they are influenced by contamination in both compo- 
nents. In contrast, the influence function of the coordinatewise M-estimator 
is the same under FDCM and FICM and does not change with correlation. 
It closely resembles the IF of the bivariate M-estimator under FICM with 
r = (Figure lb). As can be expected, the components of the coordinate- 
wise M-estimator are only influenced by contamination in the corresponding 
component, regardless of the value of r. 

Figure 2 shows the effect of the dimension d on the GES of multivariate 
location estimators under FDCM and FICM when the core model is mul- 
tivariate standard normal. We compared the affine equivariant multivariate 
5-location estimator with Tukey loss function and the corresponding coordi- 
natewise S'-estimator. Under FDCM the truncation parameter in the Tukey 
bisquare loss function determines the breakdown point of the 5-estimator 
[see, e.g., Lopuhaa (1989)]. For the multivariate location 5-estimator we have 
chosen the value of the truncation parameter that yields a 50% breakdown 
point under FDCM. Similarly, for the coordinatewise S'-estimator we have 
selected the value of the truncation parameter that yields a 50% breakdown 
point for each coordinate separately. Figure 2 clearly shows that FDCM 
severely underestimates the maximal influence of a vanishingly small fraction 
of contamination on the multivariate S'-location estimator when d is large. 
Note that the coordinatewise estimator has the same GES under FDCM 
and FICM and considerably smaller GES than its multivariate counterpart 
under FICM. 
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(a) 




Fig. 1. Influence functions for Tukey bisquare M -estimator of bivariate location. Left 
panels are for the first component, right panels are for the second component. Panel (a) 
FDCM; Panel (b) FICM with r = 0; Panel (c) FICM with r = 0.9. 

4. Propagation of outliers. FDCM is translation-scale equivariant and 
affine equivariant. Therefore, if a random vector X follows this model, then 
an affine transformation X = AX + b will also follow the model, for any 
invertible matrix A and vector b. In particular, if X has a probability e of 
contamination, the same probability holds for X. On the other hand, the 
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dimension 

Fig. 2. Gross error sensitivity for Tukey bisquare 50% breakdown S -estimator of multi- 
variate location and corresponding 50% breakdown coordinatewise S-estimator. 

independent contamination model is not affine equivariant. In fact, suppose 
that the random vector X follows the FICM and A is an invertible d x d 
matrix, then the transformed vector 

X =AX + b = A(I - B) Y + ABZ + b 

is in general different from (I — B)AY + BAZ + b, unless AB = BA (i.e., 
A is diagonal). Therefore, X does not follow the independent contamination 
model. 

The lack of affine equivariance of FICM causes a phenomenon that we 
call outlier propagation^ FICM assumes that each column in the data table 
contains an average fraction e of contamination. Since affine transformations 
linearly combine the columns, the independent contamination property is 
lost. 

To illustrate this, we generated a small two-dimensional data set of size 
n = 20. Both components come from a standard Gaussian distribution and 
we added independent contamination to each component with a contami- 
nation probability of 30%. The contaminated data come from a Gaussian 
distribution with mean 10 and variance 1. Histograms of the original compo- 
nents Xi and X2 are shown in the top panels of Figure 3. Both histograms 
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show a clear majority of clean data with approximately 1/3 of outlying 
points on the right. The thick vertical lines indicate the medians, 0.22 for 
Xi and 0.95 for X2. The medians are slightly affected by the heavy contam- 
ination but still summarize well the majority of the data. Now, we consider 
an affine transformation: Li = OMXi + 0.77^2 and L2 = 0.78Xi + O.62X2. 
Histograms of the components Li and L2 are shown in the bottom panels 
of Figure 3. From these histograms it is clear that both components now 
contain a majority of contaminated cells and hence do not satisfy FICM 
with 30% contamination anymore. In fact, we have three distinct groups in 
each dimension consisting of 49%, 42% and 9% of the data. Note that the 
medians of Li and L2 no longer reflect the location of the clean data. 

Data following FICM and other nonaffine equivariant versions of model 
(2) can severely upset standard, affine equivariant robust procedures. To 
illustrate this, we consider the following example. We generated 100 obser- 
vations from a 15 dimensional standard Gaussian distribution and added 
independent contamination to each column with a contamination probabil- 
ity of e = 15%. The contamination is obtained by adding a constant t to the 
generated values. The overall probability of a contaminated cell is thus 15%, 
which is reasonably low, so one might expect to obtain reliable estimates if 
a robust estimator is used. However, the probability that an observation is 
contaminated equals 1 — (1 — e)*^ > 90%. By applying a linear transformation 
to these data, the outlier propagation effect can spread contamination in one 
of the components of an observation over all its components. This results 



1 — I — I — I 

6 8 10 



X1 



03 r 



1 — I — I — r- 

4 6 S 10 
X2 



J3 



10 



n 



10 



L1 



L2 



Fig. 3. FICM outliers propagated by linear combinations. 
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in transformed data with a contamination probability of more than 90% for 
each cell. In such a setting, no robust estimator is supposed to be reliable 
anymore. However, for affine equivariant robust estimators the original and 
linearly transformed data sets are equivalent, which has a devastating effect 
on their performance in this setting, as illustrated in Figure 4. In this figure 
we generated data sets with 15% of independent contamination in each col- 
umn, as explained above. We varied the size of the contamination constant 
t from to 100. We calculated the multivariate location of the data using 
the sample mean, the coordinatewise median and three affine equivariant 
robust location estimators: the (i) Minimum Volume Ellipsoid (MVE), (ii) 
Minimum Covariance Determinant (MCD) [both proposed by Rousseeuw 
(1984)] and (iii) the Stahel-Donoho estimator, independently proposed by 
Stahel (1981) and Donoho (1982). If an estimator is not affected by con- 
tamination, then all components of the location vector should be close to 
zero. On the other hand, if the contamination affects the estimator, then 
some components of the location vector will become biased. For each esti- 
mator, we plotted the largest componentwise bias of the estimated location 
vector against the size of the contamination. Note that the bias of both 
MVE and MCD increases without bound. The bias of the Stahel-Donoho 
estimator as implemented in Splus increases even faster and the estimator 
crashes when the contamination constant exceeds 7. The three affine equiv- 
ariant robust estimates show clear signs of breaking down. Not surprisingly, 
so does the sample mean. On the other hand, the coordinatewise median 
is hardly affected by the outliers in each component. This example clearly 
shows that robust affine equivariant methods are not robust against prop- 
agation of outliers. (A more rigorous treatment of this claim will be given 
in the next section.) Hence, these methods are not well suited for situations 
where the contamination regime operates on individual variables (columns) 
rather than individual cases (rows). 

5. Affine equivariance and independent contamination. For simplicity, 
we will keep the Section 3 assumption that the marginal probabilities of a 
contaminated cell are equal for all components, that is, P{Bi = 1) = • • • = 
P{B(i = 1) = e. However, with obvious modifications the results hold for the 
general case as well. 

For each distribution Go on R with finite first moment, let Qh{Go) be the 
set of distribution functions G on R'^ with marginal distributions, which are 
all stochastically larger than Gq{x — h). For each 5 > set 



F&,h{GQ) ={H = (1/2 - 5)H^ + (1/2 + 5)G, G G ^/.(G'o)}. 
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Numerical example: 
Breakdown under 15% independent contamination 




"1 1 \ 1 r 

20 40 60 80 



contamination size 

Fig. 4. Affine equivariant, high breakdown-point estimators try to identify outlying cases 
and break down when more than 50% of the cases are contaminated, which can easily occur 
with small fractions of independent contamination in the variables when the dimension is 
moderately large. 

Definition 1. Let T = (Ti, . . . , T^) be an equivariant multivariate loca- 
tion estimating functional on W^. We say that T is 5-consistent at infinity, 
when the central model is Hq, if for any distribution Gq 

lim inf Ti(H)= +00, l<i<d. 

In other words, 5-consistent estimates have the property that if at least 
1/2 + 5 of the mass goes to infinity for all the coordinates, then all the 
coordinates of the estimate go to infinity too. Note that 82 > Si implies 
^S2,h{Go) C ^5i,h(G'o), thus if T is (5i-consistent, then it is also (52-consistent. 

Let us introduce the following notation. Given a distribution Ho on i?'^ de- 
note by J-^ its FICM contamination neighborhood of size e that contains all 
the distributions of X = (I — B) Y -|- BZ where Y, B and Z are independent, 
Y has distribution Hq,B is a diagonal matrix where the diagonal elements 
Bi, . . . , are independent Bernoulli variables such that P{Bi = 1) = e and 
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Z has an arbitrary distribution H* . We denote by J-^ its FDCM contami- 
nation neighborhood that contains the distributions of the form 

H = {l-e)Ho + €H*, 

where H* is arbitrary. 

We can now define the breakdown point under FICM, epjcM' ^ mul- 
tivariate location estimator T(H) as the smallest probability e of contami- 
nation in each of the components that is needed to make ||T(iJ)|| arbitrary 
large. That is, 

ei^iCM(T,i^o)=mf(e>0; sup ||T(F)|| = +cx)l. 

Theorem 1 shows that the FICM breakdown point of any equivariant 
estimate of location which is 5-consistent at infinity under FICM is at most 
1 — (1/2 — 5)^/'^ . Hence, if 5 is independent of d, the FICM breakdown point 
tends to 0. 



Theorem 1. Let T{H)be an affine equivariant multivariate location es- 
timator that is 6-consistent at infinity for the central distribution Hq, with 
finite first moments. If 

e>eo = l-(l/2-<5)i/^ 

then 

sup ||T(iJ)|| =+oo. 

Hence, 

4iCM(T,i?o)<l-(l/2-<5)i/'^. 



Proof. Consider the linear transformation U = ^dX with 



A- 



/2 
1 



1\ 



1 



VI 1 ••• 2J 

Note that A is invertible since its eigenvalues are 2 with multiplicity one 
and 1 with multiplicity {d — 1). 

Let Hh G where e > eo and Z ~ with Sh the point mass at (/i, . . . , /i) S 
R'^. It follows that with probability {1 — e)'^ = 1/2 — 6* , with 6* > 6 the vector 
X comes from Hq, and thus with probability 1/2 + 5* at least one component 
of X is equal to h. 
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Let Hh and Hq be the distributions of U when X has distribution Hh 
and Hq, respectively. Then 

Hh = {l-6*)Ho + 6*Gh, 

where Gh is the distribution of U when X has distribution conditionaUy 
on Yli=i > 0- Therefore, aU the marginals of Gh are stochastically larger 
than Go{u — h) where Gq is the distribution of — 2X]j=i \ with Y ~//o- 
Since T is (^* -consistent at infinity, we then have 

lim ||T(#,,)|| =+oo. 

Since A is invertible and T is affine equivariant, 

lim ||T(F;,)||= lim \\A-^T{Hh)\\=+^, 

h~*oo h—*oo 

proving the theorem. □ 

It is obvious that a scatter estimate breaks down whenever the multivari- 
ate location estimate it is using to center the data breaks down. Therefore, 
although Theorem 1 is stated for multivariate location, it has clear implica- 
tions for the companion scatter estimates. 

The following lemma is proven in the Appendix and will be used to show 
that many of the well-known affine equivariant robust estimators of multi- 
variate location are 5-consistent at infinity. 

Lemma 1. Suppose that T(i7) is location- scale equivariant and can be 
represented as a weighted average, that is, it can be written as 

(12) T{H) = Eh{^w{H,X)), 

where the weight function w{H,x.) satisfies: (i) w{H,x.) > 0, (ii) there exists 
K such that w{H,x.) < K and (iii) there exists r/ > such that Ph{w{H,x) > 
rj) > 1/2 — (5o for some 6o> 0. Then T is 5 -consistent at infinity when the 
central model distribution Hq has finite first moments, for all 6 > Sq. 

Examples of 5 -consistency at infinity. The following examples illustrate 
how Lemma 1 can be used to show ^-consistency at infinity for well-known 
affine equivariant high-breakdown (under FDCM) estimators of multivariate 
location. Table 1 shows that for higher dimensions {d > 10) a small amount 
of contamination in each variable suffices to break down such estimators. 
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Table 1 

Minimal fraction of independent contamination that causes breakdown of S-consistent, 

affine equivariant estimators 













Dimension 












1 


2 


3 


4 


5 


10 


15 


20 


100 


e 


0.50 


0.29 


0.21 


0.16 


0.13 


0.07 


0.05 


0.03 


0.01 



Coordinatewise mean and median. It is clear that the sample mean sat- 
isfies (12) with weights w^Hjx) = 1, hence the assumptions of Lemma 1 
hold in this case. Although the coordinatewise median does not satisfy the 
assumptions of Lemma 1 , a simple argument shows that it is (5-consistent at 
infinity for all > 0. Using the notation introduced in the proof of Lemma 1, 
we have that PcA^i ^ V^) ^ and so lim,j^oo[(l/2 - S)Hoi{Vh) + (1/2 + 
6)Ghi{Vh)] < 1/2. Therefore, lim/i_»oo Med(F/ij) = oo. Note, however, that 
the coordinatewise median is not affine equivariant and thus Theorem 1 
does not apply in this case. 

Minimum Covariance Determinant. The Minimum Covariance Determi- 
nant estimator (MCD) of multivariate location, introduced by Rousseeuw 
(1984), is defined as a scaled weighted mean T mcd{H) with weight i«(x, H) = 
Ia*{^) / Ph{-A*). The set A* is determined as follows. Let n{H,A) = 
J^:KdH{'x.)/PH{A) be the mean associated to any subset A C R'^. Then 
A* is such that its covariance matrix T,(H,A*) = £4. (x — fi{H, A*)){x — 
fj,(H,A*)ydH(x) has smallest determinant among all subsets A such that 
PiiiA) > 1/2. Clearly, the weights are nonnegative and bounded. Moreover, 
since 1 < w{x, H) <2 for all x G A*, we have that P{w{H, x) > rj) > 1/2 - Sq 
for any rj <1 and 60 > 0. Then Tmcd satisfies the assumptions of Lemma 1 
and is (5-consistent at infinity for any 5 > 0. 

S -estimators. Consider a function p: R'^ that satisfies the following 
assumptions: 

Al. p is even, bounded and nondecreasing on [0, 00) with p{0) = 0. Without 

loss of generality we will take p{oo) = 1. 
A2. p is differentiable, ijj{t) = p'(t) is differentiable at 0, and u{t) = '4j{t)/t 

is nonincreasing on [0, 00). We will also assume that p[u) < 1 implies 

V'(ti) >0. 

Then {T{H), S{H)) is defined by the values (/^,S) satisfying 



(13) 



(T(/7),5(i?)) = argmindet(S) 
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subject to 

(14) ^H(p(d(x,/i,S)/so) = 6. 

It can be shown [see, e.g., Davies (1987)] that if p is differentiable then 
T{H) satisfies the following equation 

(15) T{H) = Eh{w{^,H)^) 
with 

(16) »(x,if)- «(d(x,T(ff),S(H)) 



E„(u(d(X,T(S),Sm)) 

and 

V'(d(x,T(i7),5(i7)) 



(17) u{-^,H) 



d(x,T(F),5(i7)) ■ 



The fohowing lemma (proven in the Appendix) shows that 5-estimators are 
5-consistent at infinity for any 5 >Q. 



Lemma 2. Suppose Al and A2 are satisfied. Then the weight function 
w{x,H) associated with the S-location estimate T{H) with b= 1/2 satisfies 
assumptions (i), (ii) and (iii) of Lemma 1, for any 5 > 0. 

Lemma 2 can be extended to r-estimates of multivariate location as 
defined by Lopuhaa (1991). In addition, a simple argument shows that 
Rousseeuw's minimum volume ellipsoid is also (5-consistent at infinity for 
any 5 > [details can be found in Alqallaf et al. (2006)]. 

6. Concluding remarks. FDCM assumes that the majority of the cases is 
clean and follows the underlying model. Robust methods developed for this 
model exploit the fact that the fraction of clean cases remains constant un- 
der affine transformations and concentrate on identifying and downweight- 
ing the minority of outlying cases. In fact, the maximal breakdown point 
of any affine equivariant robust estimator cannot exceed 50%, as shown by 
Lopuhaa and Rousseeuw (1991). On the other hand, in Section 4 we have 
shown that the fraction of outlying cells in FICM can drastically change 
under affine transformations. Consequently, as demonstrated in Section 5, 
data following FICM and other nonaffine equivariant versions of model (2) 
can severely upset standard robust procedures, even if the fraction of con- 
taminated cells in the data is quite low. 

In practice, both componentwise outliers and structural outliers can occur 
simultaneously. This situation is considered by the partially spoiled indepen- 
dent contamination model {PSICM) which assumes that there is a certain 
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probability a(e) that the case is fuhy spoiled (as in the FDCM), but other- 
wise the cells are independently contaminated with probability /3(e). Take, 
for example, a(e) = e/(2 — e) and /3(e) = e/2. 

Similarly, the partially clean independent contamination model [PCICM) 
assumes that a case is free of contamination with a certain probability 1 — 
a(e) (as in the FDCM), but otherwise the different cells are independently 
contaminated with probability /3(e). Two possible choices for the functions 
a(e) and /3(e) are: (i) a(e) = 7 and /3(e) = e/7, for some < 7 < 1, and (ii) 
a(e) = /3(e) = yi. 

The choices of the functions a(e) and /3(e) in PCICM (i) and (ii) and 
PSICM are such that the probability of contamination of a single cell is 
still P{Bi = 1) = e for all i. Therefore, meaningful sensitivity analysis can 
be performed by letting e — > 0. Another important simplifying feature of 
these contamination models is that the probability that a case has exactly k 
contaminated cells does not depend on which are the k contaminated com- 
ponents of the observation. The influence function of multivariate location 
M-estimators under PCICM (i), (ii) and PSICM can be derived similarly 
as in Section 3 [see Alqallaf et al. (2006) for details]. The influence function 
under PCICM (i), (ii) turns out to be the same as under FICM and thus is 
given by (11). The influence function under PSICM becomes 

■ d 

^5(F(/fc,z),/"0,So) +5(Az,/UO,So) , 
.k=l 

which is the average of the influence functions under FICM and FDCM. Note 
that both PCICM and PSICM contain independent componentwise contam- 
ination, so the outlier propagation effect occurs in both models. However, 
the effect will be more devastating in PSICM, where no clean cases are guar- 
anteed. If the fraction of clean cases in PCICM is sufficiently large (at least 
50%), then standard affine equivariant robust estimators will show good 
behavior under this model. 

Ideally, robust methods should be resistant against all kind of outliers. 
However, He and Simpson (1993) showed that the maximal contamination 
bias of locally linear estimators has to increase with dimension. Moreover, 
Theorem 1 shows that under FICM the breakdown point of affine equivariant 
estimators decreases with dimension. These results imply that it is intrin- 
sically difficult to find estimators in high dimensions that are sufficiently 
robust against all types of outliers. Hence, one has to make a trade-off be- 
tween several desirable (robustness) properties that cannot all be achieved 
simultaneously. 

Protection against outliers propagation can be achieved by using coordi- 
natewise procedures, such as the (coordinatewise) median, that only oper- 
ate on one column at the time. Croux et al. (2003) and Maronna and Yohai 



IF(^,z) 
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(2008) use such coordinatewise procedures to construct robust methods for 
factor models and principal components, respectively. See Liu et al. (2003) 
for an application to microarray data. However, a well-known weakness of 
coordinatewise methods is their lack of robustness against structural out- 
liers. This type of outliers can only be handled by robust affine equivariant 
methods. One possible way to address this trade-off is to apply robust affine 
equivariant methods to subsets of columns at the time and combine the re- 
sults. With larger subset sizes, more protection against structural outliers 
is assured, but less protection against outliers propagation is obtained and 
vice versa. 



APPENDIX 



A.l. Derivation of (9). Since g{HQ, fiQ,T,) = for all positive definite 
matrices S and elliptically symmetric distributions Hq, we have that 



d 



S=En 



Hence, 



(18) 



0. 



e=0 



e=0 



We also have 



d_ 



A'=MO 



(19) 



-2EH,{iP'{d\y,fio, So))(Y - /io)(Y - fio)')^o' 
-i?iTo(V'(d2(Y,/io,So))I. 



— 1/2 

Let w — Sq (Y — Hq). Then w has density given by (1) with /iq — 
and Eq = I. Since w has a spherical distribution, it holds that 



(20) 



^(^'(||wf)wwO = ^S(^'(||wf)||wf )I. 



From (19) and (20) we get 



d 



where the constant = (2/(i)£^(^'(||w|p)||w|p) + £'(V'(||w|p)) is indepen- 
dent of /_fo and Sq. Finally, from (18) we get (9). 
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A. 2. Proof of Lemma 1. Suppose that T(iJ) is not (5-consistent at in- 
finity for some 6 > 5q. Then there exists a distribution Gq on R, with finite 
first moment and a sequence of distributions Gh with marginals Ghi{xi) < 
Go{xi - h), such that if we call Hh = (1/2 - 5)Hq + (1/2 + 5)Gh, then 
Ti{Hh) < c for some c, for all h> 0. Let 

Hli^i) = HhiVh^) = (1/2 - 6)Ho{Vhx) + (1/2 + 6)GhiVh^). 

Then, by scale equivariance of T{H) 

(21) Ti{H^) <c/Vh^O as/i^oo. 

Observe that M/i(x) = i/o(V^x) converges weakly to the point-mass distri- 
bution at zero, as oo. Moreover 

Ti{Hl) = (1/2 -6) J xMHl^) dMhi^) 

+ {1/2 + 6) J xMHl^)dGU^), 
where G^(x) = Gh{Vh^)- Since w{H^,-x) < K and Ei/o(|-^i|) < have 



Xiw{H^,x) dMh{x.] 



<K J \xi\dMh{x) 

cIFq (x) ^0 as /i ^ oo. 



K 



Vh 



On the other hand, if Ah = {x = (xi, . . . ,Xd) -Xi > h,l < i < d} then 
lim^^oo Pgi (A^) = 1 and therefore P^* (A^) > 1/2 + 5. 

Note that by assumptions (iii), Ph* {w{H11,x.) > r]) > 1/2 — Sq. Set 



Bh = A^n{x:w{Hlx)>i]}, 



then 



lim PmiBh) > lim Pj^. ({x : ^i;(/?;:,x) > r/}) - lim Ph;(A 
> 1/2 - (5o - (1/2 - (5) = 5 - <5o > 0. 
Since hm^^oo PM^iBh) = 0, we have 



0<(^-5o< Ihn PF*{Bh) 
= (1/2 - 5) lim PmJP/.) + (1/2 + <5) lim Pg* {Bh) 

ft— ♦oo h—*oo " 

= (1/2 + 5) hm PG;(Ph,). 
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Therefore, lim^.^oo Pg^ (Bh) > 7 = (<5 - 5o)/(l/2 + 6). Then 
hm / Xiw{H^,:x.) dG%{x.) 

h—>oo J 

(22) 

= Vhr^ lim I dGl{-^)+\iin f ^nj(Hl^) dGhi^) 

>\//Ir?hm / dGl[^)^K\mi f ■^dGh{:x.). 
Now, regarding the first term we have of the right-hand side of (22) we get 
(23) T] Mm ^fh [ dGUx) = r] Urn VhPc* (Bh) > jr] hm Vh = oo. 

♦CO JBh h-^oo ^ h—*oo 

Regarding the second term, first note that the distribution of Xi{xi < 0) 
under Gh(x) is also stochasticahy larger than the corresponding distribution 
under Gq{x — h). Then, by the change of variable y = x — h, and using this 
stochastic inequality we have 



(24) 



/ Xi dGh{x) > Xi dGo{xi -h)= {yi + h) dGo{yi 

J Xi<0 J Xi<0 Jyi<~-h 

= yidGo{yi) + h dGoiyi). 

Jvi< — h J Vi< — h 



lyi<-h Jyi<-h 

The first term is uniformly bounded as follows 



(25) 



/ VidGoiyi) < I \yi\dGQ{yi)< ( \yi\ dG^iyi) <oo. 

Jyi<^h Jyi<-~h J~oo 



The second term tends to zero because Go has finite first moments and so 
(26) h [ dGoiy^) = hPG.iVi <-h)^Q. 

Jyi<-h 

By (23) the first term in (22) tends to +oo. By (24), (25) and (26) the 
second term in (22) is uniformly bounded. Therefore, 

lim Ti{H^) = lim / Xjt(;(i?^, x) dG^(x) = +oo, 

?t— >oo h—*oo J 

contradicting (21). 

A. 3. Proof of Lemma 2. The weights fx; (x, ff) in (16) are clearly nonneg- 
ative. Moreover, u{t) is bounded because by assumption (A2), u{t) < u(0) = 
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^'(0) = K<oo. For any 0<t < 1, let Bt = {x: p{d{x,T{H),S{H))/so) <t}. 
It follows from (13) and (14) that 

i > / p(d(x, TiH),SiH))/so)) dH{x) > (1 - P{Bt))t 



and so 



(27) P{Bt) > 1 



2t 

Let to be such that 

(28) = 

that is, to = 1/(1 + 2(5o). Note that < Sq < 1/2 implies that 1/2 < to < 1- 
Moreover, combining (27) and (28) yields 

(29) P(StJ>i-<5o. 

Let ro = p^^(to), then we can write Btf^ = {x : d{x,T (H), S{H))/ sq < ro}. 
By monotonicity of u{t), for all x in Bt^^,u{d(x,T{H),S{H))/sQ) > n(ro). 
Put C = u(ro), since 1/2 < to < 1, using the assumption that tp{u) > when 
p{u) < 1 we have C > 0. Then we can write n(d(x, T{H), S{H))/so) > C, for x S 
Bto and then k > E{u{d(X,T{H), S{H))/so)) > CPiBtJ > C/4. Therefore, 
w{x,H) < 4k/C for all x and w{x,H) > (/n for x eBt^. Together with (29) 
this means that assumptions (i)-(iii) of Lemma 1 are satisfied. 
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